UFO Sightings - Data Analysis project

Prepared by Kourosh Khedri

1- Exploring Dataset

loading the dataset with pandas dataframe and checking how the first and last 10 rows look like:

In [1]:
import pandas as pd 
In [2]:
ufo_data = pd.read_csv("nuforc_reports.csv")
In [7]:
ufo_data.head(10)
Out[7]:
summary city state date_time shape duration stats report_link text posted city_latitude city_longitude
0 My wife was driving southeast on a fairly popu... Chester VA 2019-12-12T18:43:00 light 5 seconds Occurred : 12/12/2019 18:43 (Entered as : 12/... http://www.nuforc.org/webreports/151/S151739.html My wife was driving southeast on a fairly popu... 2019-12-22T00:00:00 37.343152 -77.408582
1 I think that I may caught a UFO on the NBC Nig... Rocky Hill CT 2019-03-22T18:30:00 circle 3-5 seconds Occurred : 3/22/2019 18:30 (Entered as : 03/2... http://www.nuforc.org/webreports/145/S145297.html I think that I may caught a UFO on the NBC Nig... 2019-03-29T00:00:00 41.664800 -72.639300
2 I woke up late in the afternoon 3:30-4pm. I we... NaN NaN NaN NaN NaN Occurred : 4/1/2019 15:45 (Entered as : April... http://www.nuforc.org/webreports/145/S145556.html I woke up late in the afternoon 3:30-4pm. I w... NaN NaN NaN
3 I was driving towards the intersection of fall... Ottawa ON 2019-04-17T02:00:00 teardrop 10 seconds Occurred : 4/17/2019 02:00 (Entered as : 04-1... http://www.nuforc.org/webreports/145/S145697.html I was driving towards the intersection of fall... 2019-04-18T00:00:00 45.381383 -75.708501
4 In Peoria Arizona, I saw a cigar shaped craft ... Peoria NY 2009-03-15T18:00:00 cigar 2 minutes Occurred : 3/15/2009 18:00 (Entered as : 03/1... http://www.nuforc.org/webreports/145/S145723.html In Peoria, Arizona, I saw a cigar shaped craft... 2019-04-18T00:00:00 NaN NaN
5 The object has flashing lights that are green,... Kirbyville TX 2019-04-02T20:25:00 disk 15 minutes Occurred : 4/2/2019 20:25 (Entered as : 04/02... http://www.nuforc.org/webreports/145/S145476.html The object has flashing lights that are green,... 2019-04-08T00:00:00 30.677200 -94.005200
6 Description is the same as Washington DC event... Tucson AZ 2019-05-01T11:00:00 unknown 5 minutes Occurred : 5/1/2019 11:00 (Entered as : 5/1/1... http://www.nuforc.org/webreports/145/S145947.html Description is the same as Washington, DC, eve... 2019-05-09T00:00:00 32.259941 -110.927542
7 Apr. 10th we witnessed a very bright silvery r... Gold Canyon AZ 2019-04-10T17:00:00 circle 10 minutes Occurred : 4/10/2019 17:00 (Entered as : 04/1... http://www.nuforc.org/webreports/145/S145766.html Apr. 10th we witnessed a very bright silvery r... 2019-04-25T00:00:00 33.371500 -111.436900
8 Ufos report in Irving Texas at 2200 hrs. On or... Dallas TX 1973-07-14T22:00:00 oval 6 minutes Occurred : 7/14/1973 22:00 (Entered as : 07/1... http://www.nuforc.org/webreports/145/S145751.html Ufos report in Irving Texas at 2200 hrs. On o... 2019-04-25T00:00:00 32.835168 -96.808118
9 Group of lights formation sweeping thru a nigh... Caloocan City (Philippines) NaN 2019-06-06T19:00:00 other 19:00 to 19:30 Occurred : 6/6/2019 19:00 (Entered as : 6/6/2... http://www.nuforc.org/webreports/146/S146694.html group of lights formation sweeping thru a nigh... 2019-06-07T00:00:00 NaN NaN
In [8]:
ufo_data.tail(10)
Out[8]:
summary city state date_time shape duration stats report_link text posted city_latitude city_longitude
88115 Cigar shapes in Nassau Oceanside NY 2019-10-02T05:33:00 cigar 30 seconds Occurred : 10/2/2019 05:33 (Entered as : 10/0... http://www.nuforc.org/webreports/149/S149417.html Cigar shapes in Nassau Walking home from work ... 2019-10-04T00:00:00 40.634300 -73.638100
88116 Stepped out into our backyard to let out the d... Clarksville AR 2019-10-02T06:35:00 light 5 seconds Occurred : 10/2/2019 06:35 (Entered as : 10/2... http://www.nuforc.org/webreports/149/S149406.html Stepped out into our backyard to let out the d... 2019-10-04T00:00:00 35.507100 -93.511900
88117 On history channel there should be a contact #... Hayfork CA 2019-10-02T12:00:00 NaN 7 minutes Occurred : 10/2/2019 12:00 (Entered as : 10 2... http://www.nuforc.org/webreports/149/S149475.html On history channel there should be a contact #... 2019-10-04T00:00:00 NaN NaN
88118 Bright light ascending directly upwards throug... Calmar AB 2019-10-02T19:00:00 light >10 minutes Occurred : 10/2/2019 19:00 (Entered as : 10/0... http://www.nuforc.org/webreports/149/S149414.html Bright light ascending directly upwards throug... 2019-10-04T00:00:00 53.250000 -113.783300
88119 there was a stationary orange light in the eas... Morgan City LA 2019-10-02T19:15:00 light 2 minutes Occurred : 10/2/2019 19:15 (Entered as : 10/0... http://www.nuforc.org/webreports/149/S149421.html there was a stationary orange light in the eas... 2019-10-04T00:00:00 29.699692 -91.069123
88120 4 lights in formation over Tempe appear while ... Tempe AZ 2019-10-02T20:00:00 formation 3 minutes Occurred : 10/2/2019 20:00 (Entered as : 10/2... http://www.nuforc.org/webreports/149/S149463.html 4 lights in formation over Tempe appear while ... 2019-10-04T00:00:00 33.414036 -111.920920
88121 2 bright star like lights in the NNW skys, ((... Bolivar MO 2019-10-02T20:00:00 light 20 seconds Occurred : 10/2/2019 20:00 (Entered as : 10/0... http://www.nuforc.org/webreports/149/S149405.html 2 bright star like lights in the NNW sky two b... 2019-10-04T00:00:00 37.642200 -93.399600
88122 I just witnessed a ‘Phoenix Lights’ type of fo... North Port FL 2019-10-02T20:03:00 formation 20 seconds Occurred : 10/2/2019 20:03 (Entered as : 10/0... http://www.nuforc.org/webreports/149/S149424.html 10/2/19 @ 8:03PM EST UFO SIGHTING in the 3428... 2019-10-04T00:00:00 27.076210 -82.223280
88123 Witnessed an orange, slow moving light. Was lo... Black Mountain NC 2019-10-02T22:00:00 fireball 2 minutes Occurred : 10/2/2019 22:00 (Entered as : 10/0... http://www.nuforc.org/webreports/149/S149447.html Witnessed an orange, slow moving light. Was lo... 2019-10-04T00:00:00 35.605000 -82.313200
88124 Glowing lights in formation just south of San... Marin County CA 2019-10-02T22:00:00 sphere 3 minutes Occurred : 10/2/2019 22:00 (Entered as : 10-0... http://www.nuforc.org/webreports/149/S149436.html Glowing lights in formation just south of San... 2019-10-04T00:00:00 NaN NaN

Checking if there are missing values exists in the dataset and what is the shape of data table:

In [10]:
ufo_data.isnull().sum()
Out[10]:
summary              30
city                234
state              5235
date_time          1187
shape              2498
duration           3171
stats                37
report_link           0
text                 55
posted             1187
city_latitude     16112
city_longitude    16112
dtype: int64
In [11]:
ufo_data.shape
Out[11]:
(88125, 12)

2- Data Cleaning

keeping the columns that I would like to clean and analyze:

In [12]:
# Leaving only the necessary columns
df1 = ufo_data[['city', 'state', 'date_time', 'shape', 'text']]
In [13]:
df1.head(5)
Out[13]:
city state date_time shape text
0 Chester VA 2019-12-12T18:43:00 light My wife was driving southeast on a fairly popu...
1 Rocky Hill CT 2019-03-22T18:30:00 circle I think that I may caught a UFO on the NBC Nig...
2 NaN NaN NaN NaN I woke up late in the afternoon 3:30-4pm. I w...
3 Ottawa ON 2019-04-17T02:00:00 teardrop I was driving towards the intersection of fall...
4 Peoria NY 2009-03-15T18:00:00 cigar In Peoria, Arizona, I saw a cigar shaped craft...

Missing values treatment:

In [14]:
# Removing rows with missing values
df1 = df1.dropna(axis=0).reset_index(drop=True)
In [16]:
df1.shape
Out[16]:
(79537, 5)

Other data cleaning considerations:

In [17]:
# Fixing an abbreviation duplication issue
df1['state'] = df1['state'].apply(lambda x: 'QC' if x=='QB' else x)
In [18]:
# Creating a list of Canadian provinces
canada = ['ON', 'QC', 'AB', 'BC', 'NB', 'MB',
          'NS', 'SK', 'NT', 'NL', 'YT', 'PE']  
In [19]:
# Creating new columns: `country`, `year`, `month`, and `time`
df1['country'] = df1['state'].apply(\
                  lambda x: 'Canada' if x in canada else 'USA')
df1['year'] = df1['date_time'].apply(lambda x: x[:4]).astype(int)
df1['month'] = df1['date_time'].apply(lambda x: x[5:7]).astype(int)
df1['month'] = df1['month'].replace({1: 'Jan', 2: 'Feb', 3: 'Mar', 
                                   4: 'Apr', 5: 'May', 6: 'Jun',
                                   7: 'Jul', 8: 'Aug', 9: 'Sep', 
                                   10: 'Oct', 11: 'Nov', 12: 'Dec'})
df1['time'] = df1['date_time'].apply(lambda x: x[-8:-6]).astype(int)
In [22]:
# Dropping an already used column
df1 = df1.drop(['date_time'], axis=1)
In [23]:
# Dropping duplicated rows
df1 = df1.drop_duplicates().reset_index(drop=True)
In [24]:
df1.head(20)
Out[24]:
city state shape text country year month time
0 Chester VA light My wife was driving southeast on a fairly popu... USA 2019 Dec 18
1 Rocky Hill CT circle I think that I may caught a UFO on the NBC Nig... USA 2019 Mar 18
2 Ottawa ON teardrop I was driving towards the intersection of fall... Canada 2019 Apr 2
3 Peoria NY cigar In Peoria, Arizona, I saw a cigar shaped craft... USA 2009 Mar 18
4 Kirbyville TX disk The object has flashing lights that are green,... USA 2019 Apr 20
5 Tucson AZ unknown Description is the same as Washington, DC, eve... USA 2019 May 11
6 Gold Canyon AZ circle Apr. 10th we witnessed a very bright silvery r... USA 2019 Apr 17
7 Dallas TX oval Ufos report in Irving Texas at 2200 hrs. On o... USA 1973 Jul 22
8 Brookville IN sphere Metal orb of wires that was seen through a tel... USA 2019 Jun 21
9 Melbourne Beach FL unknown We think 2 UFOs....2 tiny lights recorded for ... USA 2019 Jun 22
10 Carrizozo NM changing I was driving and saw three glowing orbs is th... USA 2019 Jun 22
11 Waco TX circle I was in pool and my wife was sitting on the e... USA 2018 Jun 1
12 Centerville IA circle Bright Circle of Light followed me from Oskalo... USA 1999 Aug 2
13 Gray Court SC light Strange bright light hovered over mobile home.... USA 1975 Jul 0
14 Yuba City CA formation There were 4 lights in diagonal formation that... USA 2019 Aug 0
15 Abilene TX light star-like light that started bouncing in the a... USA 2019 Aug 1
16 Leyner CO light There where 4 bright lights in a shape of a di... USA 2019 Aug 20
17 Catalina AZ cigar My wife and I were taking our usual evening st... USA 2019 Aug 20
18 Santa Barbara CA flash Abnormal flashing object in SoCal As an avid s... USA 2019 Aug 20
19 Charlestown RI chevron Very unusual, extreme chevron-shaped aircraft ... USA 2019 Aug 14
In [25]:
round(df1['country'].value_counts(normalize=True)*100)
Out[25]:
USA       96.0
Canada     4.0
Name: country, dtype: float64

3- Data Visualization

Visualizing frequency of UFO occurences in each month:

In [26]:
import matplotlib.pyplot as plt
import seaborn as sns
In [27]:
# Creating a series object for UFO occurences by month, in %
months = df1['month'].value_counts(normalize=True)\
           [['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
             'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']]*100
In [28]:
# Defining a function for creating and customizing a figure in matplotlib
def create_customized_fig():
    fig, ax = plt.subplots(figsize=(12,6))
    plt.title('UFO occurences by month, %', fontsize=27)
    plt.ylim(0,15)
    plt.xticks(fontsize=20)
    plt.yticks(fontsize=20)
    ax.tick_params(bottom=False)
    sns.despine()
    return ' '
In [30]:
# PLOTTING
create_customized_fig()

# Creating a stem plot
plt.stem(months.index, months) 

plt.show()
<ipython-input-30-7d6ba4b04b9f>:5: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
  plt.stem(months.index, months)

UFO shape frequency distribution, to check whether some shapes are more common than the others:

In [31]:
# Creating a series of shapes and their frequencies 
# in ascending order
shapes = df1['shape'].value_counts(normalize=True,
                                  ascending=True)*100
fig, ax = plt.subplots(figsize=(12,9))
# Creating a vertical stem plot
plt.hlines(y=shapes.index, 
           xmin=0, xmax=shapes, 
           color='slateblue',
           linestyle='dotted', linewidth=5)
plt.plot(shapes, shapes.index, 
         '*', ms=17, 
         c='darkorange')
plt.title('UFO shapes by sighting frequency, %', fontsize=29)
plt.xlim(0,25)
plt.yticks(fontsize=20)
plt.xticks(fontsize=20)
ax.tick_params()
sns.despine()
plt.show()

According to the witnesses, UFO can take a wide range of incredible forms, including diamonds, cigars, chevrons, teardrops, and crosses. The far most frequent form (22%), however, is described as just a light.

Visualizing most common used words in USA and Canada via Word Cloud:

In [32]:
from wordcloud import WordCloud, STOPWORDS
# Gathering sighting descriptions from all American witnesses
text = ''
for t in df1[df1['country']=='USA'].loc[:, 'text']:
    text += ' ' + t
fig = plt.subplots(figsize=(10,10)) 
# Creating a basic word cloud
wordcloud = WordCloud(width=1000, height=1000, 
                      collocations=False).generate(text)
plt.title('USA collective description of UFO', fontsize=27)
plt.imshow(wordcloud)
plt.axis('off')
plt.show()
# Saving the word cloud
wordcloud.to_file('wordcloud_usa.png')
Out[32]:
<wordcloud.wordcloud.WordCloud at 0x25c2e76c790>

The most common words are "light", "object", and "sky", followed by "bright", "time", "moving", "white", "red", "craft", "star". Among the most frequent words, there are some low-informative ones, like "one", "second", "saw", "see", "seen", "looked", etc. We can assume that American witnesses mostly observed bright craft objects of white or red color, moving in the sky and emitting light.

In [33]:
# Gathering sighting descriptions from all Canadian witnesses
text = ''
for t in df1[df1['country']=='Canada'].loc[:, 'text']:
    text += ' ' + t
# Creating a user stopword list
stopwords = ['one', 'two', 'first', 'second', 'saw', 'see', 'seen',
             'looked', 'looking', 'look', 'went', 'minute', 'back', 
             'noticed', 'north', 'south', 'east', 'west', 'nuforc',
             'appeared', 'shape', 'side', 'witness', 'sighting', 
             'going', 'note', 'around', 'direction', 'approximately',
             'still', 'away', 'across', 'seemed', 'time']
fig = plt.subplots(figsize=(10,10)) 
# Creating and customizing a word cloud
wordcloud = WordCloud(width=1000, height=1000, 
                      collocations=False,
                      colormap='cool',
                      background_color='yellow',
                      stopwords=STOPWORDS.update(stopwords), 
                      prefer_horizontal=0.85,
                      random_state=100,
                      max_words=100,
                      min_word_length=3).generate(text)
plt.title('Canadian collective description of UFO', fontsize=27)
plt.imshow(wordcloud)
plt.axis('off')
plt.show()
# Saving the word cloud
wordcloud.to_file('wordcloud_canada.png')
Out[33]:
<wordcloud.wordcloud.WordCloud at 0x25c2e6fe9d0>

It seems that the descriptions given by Canadian people are rather similar to those from Americans, with the addition of some other frequent words: "orange", "plane", "night", "minutes", "seconds", "cloud", "flying", "speed", "sound". We can assume that Canadians witnessed bright craft objects, of white, red, or orange color, mostly at night time, moving/flying in the sky and emitting light and, probably, sound. At first, the objects looked like stars, planes, or clouds, and the whole process lasted several seconds to minutes.

The difference between Canadian and American collective descriptions can be partially explained by adding some more words to the stopword list. Or, maybe, “Canadian” aliens are really more orange, plane- or cloud-like, and noisy.

Visualizing UFO sighting frequencies by each state in USA:

In [35]:
pip install squarify
Collecting squarifyNote: you may need to restart the kernel to use updated packages.

  Downloading squarify-0.4.3-py3-none-any.whl (4.3 kB)
Installing collected packages: squarify
Successfully installed squarify-0.4.3
In [36]:
import squarify
# Extract the data
states = df1[df1['country']=='USA'].loc[:, 'state'].value_counts()
fig = plt.subplots(figsize=(12,6))
# Creating a treemap
squarify.plot(sizes=states.values, label=states.index)
plt.title('UFO sighting frequencies by state, the USA', fontsize=27)
plt.axis('off')
plt.show()

Looks like California is a real extraterrestrial base in the USA! It’s followed with a big gap by Florida, Washington, and Texas, while the territories of District of Columbia and Puerto Rico are visited by UFO very rarely.

Visualizing what time the most/least aliens were seen:

In [37]:
import matplotlib
# Extracting the data
hours = df1['time'].value_counts()
# Creating a list of colors from 2 matplotlib colormaps 
# `Set3` and `tab20`
cmap1 = matplotlib.cm.Set3
cmap2 = matplotlib.cm.tab20
colors = []
for i in range(len(hours.index)):
    colors.append(cmap1(i))
    if cmap2(i) not in colors:
        colors.append(cmap2(i))
        
fig = plt.subplots(figsize=(12,6))
# Creating and customizing a treemap
squarify.plot(sizes=hours.values, label=hours.index,
              color=colors, alpha=0.8, 
              pad=True,
              text_kwargs={'color': 'indigo',
                           'fontsize': 20, 
                           'fontweight': 'bold'})
plt.title('UFO sighting frequencies by hour', fontsize=27)
plt.axis('off')
plt.show()

The respondents from our dataset mostly observed UFO in the time range from 20:00 till 23:00, or, more generally, from 19:00 till midnight. The least “UFO-prone” hours are 07:00–09:00. However, it doesn’t necessarily mean the “lack of aliens” in certain hours of the day and instead can be explained more pragmatically: usually people have free time in the evening after work, while in the morning the majority of people are going to work and are a bit too immersed in their thoughts to notice interesting phenomena around them.

Visualizing UFO shape appearances between 2015 - 2019:

In [39]:
pip install matplotlib-venn
Collecting matplotlib-venn
  Downloading matplotlib-venn-0.11.6.tar.gz (29 kB)
Requirement already satisfied: matplotlib in c:\users\kouro\anaconda3\lib\site-packages (from matplotlib-venn) (3.2.2)
Requirement already satisfied: numpy in c:\users\kouro\anaconda3\lib\site-packages (from matplotlib-venn) (1.20.2)
Requirement already satisfied: scipy in c:\users\kouro\anaconda3\lib\site-packages (from matplotlib-venn) (1.5.0)
Requirement already satisfied: cycler>=0.10 in c:\users\kouro\anaconda3\lib\site-packages (from matplotlib->matplotlib-venn) (0.10.0)
Requirement already satisfied: python-dateutil>=2.1 in c:\users\kouro\anaconda3\lib\site-packages (from matplotlib->matplotlib-venn) (2.8.1)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\kouro\anaconda3\lib\site-packages (from matplotlib->matplotlib-venn) (1.2.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\users\kouro\anaconda3\lib\site-packages (from matplotlib->matplotlib-venn) (2.4.7)
Requirement already satisfied: six in c:\users\kouro\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib->matplotlib-venn) (1.15.0)
Building wheels for collected packages: matplotlib-venn
  Building wheel for matplotlib-venn (setup.py): started
  Building wheel for matplotlib-venn (setup.py): finished with status 'done'
  Created wheel for matplotlib-venn: filename=matplotlib_venn-0.11.6-py3-none-any.whl size=32067 sha256=b21b01ce3c59aca3e9f8db235fb34eaa8fff721b8386784ba82d0e1571da5864
  Stored in directory: c:\users\kouro\appdata\local\pip\cache\wheels\82\e4\64\dd790d424818bc2f59c11471a1eee5dc8cfcd3f8ee8c4812fa
Successfully built matplotlib-venn
Installing collected packages: matplotlib-venn
Successfully installed matplotlib-venn-0.11.6
Note: you may need to restart the kernel to use updated packages.
In [40]:
from matplotlib_venn import *
# Creating the subsets for crosses and cigars
crosses = df1[(df1['shape']=='cross')&\
             (df1['year']>=2015)&(df1['year']<=2019)].loc[:, 'city']
cigars = df1[(df1['shape']=='cigar')&\
            (df1['year']>=2015)&(df1['year']<=2019)].loc[:, 'city']
fig = plt.subplots(figsize=(12,8))
# Creating a Venn diagram
venn2(subsets=[set(crosses), set(cigars)], 
      set_labels=['Crosses', 'Cigars'])
plt.title('Crosses and cigars by number of cities, 2015-2019', 
          fontsize=27)
plt.show()

In the period from 2015 till 2019 inclusive, there were 18 cities in North America where both crosses and cigars were registered. In 79 cities, only crosses were observed (from these 2 shapes), in 469 – only cigars.

In [41]:
# Creating a subset for diamonds
diamonds = df1[(df1['shape']=='diamond')&\
              (df1['year']>=2015)&(df1['year']<=2019)].loc[:, 'city']
# Creating a list of subsets
subsets=[set(crosses), set(cigars), set(diamonds)]
fig = plt.subplots(figsize=(15,10))
# Creating a Venn diagram for the 3 subsets
venn3(subsets=subsets, 
      set_labels=['Crosses', 'Cigars', 'Diamonds'],
      set_colors=['magenta', 'dodgerblue', 'gold'],
      alpha=0.3)
# Customizing the circumferences of the circles 
venn3_circles(subsets=subsets,
              color='darkviolet', alpha=0.9, 
              ls='dotted', lw=4)
plt.title('Crosses, cigars, and diamonds \nby number of cities, 2015-2019', fontsize=26)
plt.show()

Hence, in the period of interest there were 6 cities in North America where all 3 shapes were registered, 66 cities – where only cigars and diamonds, 260 – only diamonds, etc.

This is to check what are those 6 cities that have in common for all the 3 shapes:

In [42]:
print(set(crosses) & set(cigars) & set(diamonds))
{'New York', 'Albuquerque', 'Savannah', 'Staten Island', 'Lakewood', 'Rochester'}

All of them are located in the USA.

Box Plot Visualization

In [43]:
df1[df1['country']=='Canada'].loc[:, 'state'].value_counts()[:3]
Out[43]:
ON    1363
BC     451
AB     369
Name: state, dtype: int64
In [59]:
# Extracting the data for cylinders and cones 
# from California and Ontario
CA_ON_cyl_con = df1[((df1['state']=='CA')|(df1['state']=='ON'))&((df1['shape']=='cylinder')|(df1['shape']=='cone'))]
fig = plt.subplots(figsize=(12,7))
sns.set(style='white')
# Creating swarm plots
sns.swarmplot(data=CA_ON_cyl_con, x='year', y='state', palette=['deeppink', 'blue'])
# Creating box plots
sns.boxplot(data=CA_ON_cyl_con, x='year', y='state', palette=['palegreen', 'lemonchiffon'])
plt.title('Cylinders and cones in California and Ontario', fontsize=29)
plt.xlabel('Years', fontsize=18)
plt.ylabel('States', fontsize=18)
sns.despine()
plt.show()

The following observations can be made:

- Since the numeric variable in question (year) is an integer, the data points are aligned.

- Both subsets are quite different in terms of their sample size.

- The Californian subset is heavily left-skewed and contains a lot of outliers.

- We definitely should add to our “wish list” the possibility to distinguish between cylinders and cones for each subset.

This is to exclude the outliers from the visualization:

In [60]:
fig = plt.subplots(figsize=(12,7))
# Creating swarm plots
sns.swarmplot(data=CA_ON_cyl_con, x='year', y='state', palette=['deeppink', 'blue'], hue='shape')
# Creating box plots
sns.boxplot(data=CA_ON_cyl_con, x='year', y='state', palette=['palegreen', 'lemonchiffon'])
plt.title('Cylinders and cones in California and Ontario', fontsize=29)
plt.xlim(1997,2020)
plt.xlabel('Years', fontsize=18)
plt.ylabel('States', fontsize=18)
plt.legend(loc='upper left', frameon=False, fontsize=15)
sns.despine()
plt.show()

Both plots show that the predominant majority of UFO for these 2 subsets are cylinders. For the Californian subset, we can distinguish the years of particularly frequent occurences of cylindric/conic UFO: 2008, 2015, and 2019. Moreover, in 2015, we observe an unexpected boom of cones, despite they are much rarer in general.

Visualizing the UFO occurrences on the map:

In [62]:
pip install folium
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
Requirement already satisfied: requests in c:\users\kouro\anaconda3\lib\site-packages (from folium) (2.24.0)
Requirement already satisfied: jinja2>=2.9 in c:\users\kouro\anaconda3\lib\site-packages (from folium) (2.11.2)
Requirement already satisfied: numpy in c:\users\kouro\anaconda3\lib\site-packages (from folium) (1.20.2)
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Requirement already satisfied: chardet<4,>=3.0.2 in c:\users\kouro\anaconda3\lib\site-packages (from requests->folium) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in c:\users\kouro\anaconda3\lib\site-packages (from requests->folium) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\kouro\anaconda3\lib\site-packages (from requests->folium) (2020.12.5)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\users\kouro\anaconda3\lib\site-packages (from requests->folium) (1.25.9)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\kouro\anaconda3\lib\site-packages (from jinja2>=2.9->folium) (1.1.1)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
Note: you may need to restart the kernel to use updated packages.
In [63]:
import folium
from folium.plugins import HeatMap
In [64]:
df_plot = ufo_data[~ufo_data.city_latitude.isna()]
df_plot.shape
Out[64]:
(72013, 12)
In [65]:
# heatmap of locations for first overview
zoom_factor = 2 # inital map size
my_map_1 = folium.Map(location=[0,0], zoom_start=zoom_factor)
HeatMap(data=df_plot[['city_latitude', 'city_longitude']], radius=10).add_to(my_map_1)
my_map_1 # display
Out[65]:
Make this Notebook Trusted to load map: File -> Trust Notebook

we can see the location of each occurrence on the map and check where they were mostly focused.

Visualizing the trend between 1970 to 2019:

In [70]:
pd.pivot_table(df1,index='year',values='state',aggfunc='count').plot(figsize=(10,6))
plt.title('1970 - 2019')
Out[70]:
Text(0.5, 1.0, '1970 - 2019')

There are big jumps in 2005 and 2010. Also, there is a significant dicline in 2014. It seems that in January 2020 there might be an increase of the incidents.